Search CORE

80 research outputs found

Pronominal anaphora in Basque: annotation of a real corpus

Author: Aduriz Itziar
Ceberio Klara
Díaz de Ilarraza Sánchez Arantza
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural (SEPLN)
Publication date: 25/02/2019
Field of study

This paper describes the process followed in the annotation of pronominal anaphora in the Eus3LB corpus1 of Basque. Our aim is to use this annotation as the basis for later computational treatment of our language. We present the linguistic analysis carried out, the criteria defined for the tagging and some relevant linguistic conclusions about the features of the antecedents needed to link them correctly to their anaphoric elements

Diposit Digital de la Universitat de Barcelona

Pronominal Anaphora in Basque: computational point of view and the development of a corpus

Author: Aduriz Itziar
Ceberio Klara
Díaz de Ilarraza Sánchez Arantza
Publication venue: Universidad del País Vasco / Euskal Herriko Unibersitatea
Publication date: 11/11/2019
Field of study

This paper describes the process of annotating pronominal anaphor in a corpus of Basque which consists of 54.000 words. Our aim is to use this annotation as a basis for later computational processing. The linguistic study carried out and the criteria defined for the tagging process are also presented in the pape

Diposit Digital de la Universitat de Barcelona

The Corpus of Basque Simplified Texts (CBST)

Author: Aranzabe Urruzola María Jesús
Díaz de Ilarraza Sánchez Arantza
González Dios Itziar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2018
Field of study

In this paper we present the corpus of Basque simplified texts. This corpus compiles 227 original sentences of science popularisation domain and two simplified versions of each sentence. The simplified versions have been created following different approaches: the structural, by a court translator who considers easy-to-read guidelines and the intuitive, by a teacher based on her experience. The aim of this corpus is to make a comparative analysis of simplified text. To that end, we also present the annotation scheme we have created to annotate the corpus. The annotation scheme is divided into eight macro-operations: delete, merge, split, transformation, insert, reordering, no operation and other. These macro-operations can be classified into different operations. We also relate our work and results to other languages. This corpus will be used to corroborate the decisions taken and to improve the design of the automatic text simplification system for Basque.Cerrar texto de financiación Itziar Gonzalez-Dios's work was funded by a Ph.D. grant from the Basque Government and a postdoctoral grant for the new doctors from the Vice-rectory of Research of the University of the Basque Country (UPV/EHU). We are very grateful to the translator and teacher that simplified the texts. We also want to thank Dominique Brunato, Felice Dell'Orletta and Giulia Venturi for their help with the Italian annotation scheme and their suggestions when analysing the corpus and Oier Lopez de Lacalle for his help with the statistical analysis. We also want to express our gratitude to the anonymous reviewers for their comments and suggestions. This research was supported by the Basque Government (IT344-10), and the Spanish Ministry of Economy and Competitiveness, EXTRECM Project (TIN2013-46616-C2-1-R)

Archivo Digital para la Docencia y la Investigación

Primeros pasos en la anotación tanto manual como automática de informes clínicos en Español

Author: Díaz de Ilarraza Sánchez Arantza
Oronoz Anchordoqui Maite
Torices Ortzi
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2010
Field of study

Este artículo presenta la anotación de un corpus de informes clínicos de pacientes de ginecología y obstetricia, así como el desarrollo de un esquema de anotación para su etiquetado manual. Centramos nuestra descripción en el etiquetado manual de los informes y en la adaptación de la herramienta para el Procesamiento del Lenguaje Natural Freeling al dominio médico.This paper presents the annotation of a corpus of gynaecology and obstetrics patient records and the development of an annotation scheme for its hand tagging. We focus our description in the manual annotation of the clinical notes and in the adaptation of the Natural Language Processing analyzer Freeling to the medical domain

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Euskarazko anafora pronominala: ikuspuntu konputazionala eta corpus baten garapena

Author: Aduriz Itziar
Ceberio Berger Klara
Díaz de Ilarraza Sánchez Arantza
Publication venue: Servicio Editorial de la Universidad del País Vasco/Euskal Herriko Unibertsitatearen Argitalpen Zerbitzua
Publication date: 01/01/2005
Field of study

Archivo Digital para la Docencia y la Investigación

Euskarazko anafora pronominala: ikuspuntu konputazionala eta corpus baten garapena

Author: Aduriz Itziar
Ceberio Berger Klara
Díaz de Ilarraza Sánchez Arantza
Publication venue: Servicio Editorial de la Universidad del País Vasco/Euskal Herriko Unibertsitatearen Argitalpen Zerbitzua
Publication date: 01/01/2005
Field of study

Archivo Digital para la Docencia y la Investigación

Euskarazko denbora-egiturak etiketatzeko gidalerroak v2.0

Author: Altuna Díaz Begoña
Aranzabe Urruzola María Jesús
Díaz de Ilarraza Sánchez Arantza
Publication venue
Publication date: 11/02/2016
Field of study

[EN]To interpret the temporal information on texts, a mark-up language that will code that information is needed, in order to make that information automatically reachable. The most used mark-up language is TimeML (Pustejovsky et al., 2003), which has also been choosen for Basque. In this guidelines we present the Basque version of ISO-TimeML (ISO-TimeML working group, 2008). After having analysed the tags, attributes and values created for English, we describe the most appropriate ones to represent Basque time structures’ information.[EU]Testuetan agertzen den denborazko informazioa interpretatu ahal izateko, informazio hori kodetuko duen markaketa-lengoaia behar da, gerora informazio hori automatikoki baliatu ahal izateko. TimeML (Pustejovsky et al., 2003) etiketatze-lengoaia da zabalduena eta euskararako ere erabili dena. Lan honetan ISO-TimeMLren (ISO-TimeML working group, 2008) euskararako moldaketa aurkezten da; ingeleserako sortutako etiketa, atributu eta horien balioak aztertu ostean, euskarazko denbora-egituren informazioa hobekien islatzen dituztenak deskribatzen dira, hain zuzen ere

Archivo Digital para la Docencia y la Investigación

EXTracción de RElaciones entre Conceptos Médicos en fuentes de información heterogéneas (EXTRECM)

Author: Araujo Serna Lourdes
Díaz de Ilarraza Sánchez Arantza
Gojenola Galletebeitia Koldo
Martínez Unanue Raquel
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2015
Field of study

En este proyecto se plantea la extracción de relaciones entre conceptos médicos en documentos científicos, historiales médicos e información de carácter general en Internet, en varias lenguas utilizando técnicas y herramientas de Procesamiento de Lenguaje Natural y Recuperación de Información. El proyecto se propone demostrar, mediante dos casos de uso, los beneficios de la aplicación de este tipo de tecnologías lingüísticas al dominio de la salud.This project addresses extraction of medical concepts relationship in scientific documents, medical records and general information on the Internet, in several languages by using advanced Natural Language Processing and Information Retrieval techniques and tools. The project aims to show, through two use cases, the benefits of the application of language technology in the health sector.TIN2013-46616-C2-1-R, TIN2013-46616-C2-2-R

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Euskarazko denbora-egiturak etiketatzeko gidalerroak v1.0

Author: Altuna Díaz Begoña
Aranzabe Urruzola María Jesús
Díaz de Ilarraza Sánchez Arantza
Publication venue
Publication date: 01/12/2014
Field of study

To interpret the temporal information on texts, a mark-up language that will code that information is needed, in order to make that information automatically reachable. The most used mark-up language is TimeML (Pustejovsky et al., 2003), which has also been choosen for Basque. In this guidelines we present the Basque version of ISO-TimeML (ISO-TimeML working group, 2008). After having analysed the tags, attributes and values created for English, we describe the most appropriate ones to represent Basque time structures’ information.Testuetan agertzen den denborazko informazioa interpretatu ahal izateko, informazio hori kodetuko duen markaketa-lengoaia behar da, gerora informazio hori automatikoki baliatu ahal izateko. TimeML (Pustejovsky et al., 2003) etiketatze-lengoaia da zabalduena eta euskararako ere erabili dena. Lan honetan ISO-TimeMLren (ISO-TimeML working group, 2008) euskararako moldaketa aurkezten da; ingeleserako sortutako etiketa, atributu eta horien balioak aztertu ostean, euskarazko denbora-egituren informazioa hobekien islatzen dituztenak deskribatzen dira, hain zuzen ere

Archivo Digital para la Docencia y la Investigación

Erroreak automatikoki detektatzeko tekniken azterlana eta euskararentzako aplikazioak

Author: Díaz de Ilarraza Sánchez Arantza
Gojenola Galletebeitia Koldobika
Oronoz Anchordoqui Maite
Publication venue: Servicio Editorial de la Universidad del País Vasco/Euskal Herriko Unibertsitatearen Argitalpen Zerbitzua
Publication date: 01/01/2009
Field of study

In this article, we study the techniques used for detecting errors in Natural Language Processing (NLP). We classify the techniques according to their approach (symbolic or empirical), and then we describe them in depth. Following that, we describe the systems we have developed for detecting syntactic errors in Basque, by using that technique as a criterion for the classification of those systems, and enhancing it with examples

Archivo Digital para la Docencia y la Investigación

Universidad del País Vasco / Euskal Herriko Unibertsitatea: Ciencia - Portal de revistas digitales de la UPV/EHU